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Abstract —The traveling salesman problem (TSP) 
is one of the most challenging NP-hard problems. It 
has widely applications in various disciplines such 
as physics, biology, computer science and so forth. 
The best known approximation algorithm for Sym¬ 
metric TSP (STSP) whose cost matrix satisfies the 
triangle inequality (called ASTSP) is Christofides 
algorithm which was proposed in 1976 and is a |- 
approximation. Since then no proved improvement 
is made and improving upon this bound is a funda¬ 
mental open question in combinatorial optimization. 
In this paper, for the first time, we propose Trun¬ 
cated Generalized Beta distribution (TGB) for the 
probability distribution of optimal tour lengths in a 
TSP. We then introduce an iterative TGB approach 
to obtain quality-proved near optimal approximation, 
i.e., (l+^(^^) K_1 )-approximation where K is the 
number of iterations in TGB and o(>> 1) is the 
shape parameters of TGB. The result can approach 
the true optimum as K increases. 

Index Terms —Symmetric traveling salesman prob¬ 
lem (STSP); Triangle inequality; Random TSP in a 
unit square; TSPLIB instances Approximation ratio; 
k-opt; Computational complexity 

I. Introduction 

The TSP is one of most researched problems in 
combination optimization because of its importance 
in both academic need and real world applica¬ 
tions. For surveys of the TSP and its applications, 
the reader is referred to [Cook,2012][An et al., 
2012][Vygen, 2012] and references therein. 


After 39 years, Christofides’ |-approximation 
algorithm [Christofide, 1976] still keeps the best 
performance guarantee known for the symmetric 
traveling salesman problem satisfying triangle in¬ 
equality (ASTSP), and improving upon this bound 
is a fundamental open question in combinatorial 
optimization, see [Cook, 2012][Gutin et al., 2002] 
and references therein. [Vygen, 2012] also provides 
a detailed survey on new approximation algorithms 
for the TSP. [Johnson et al., 1998] provide a com¬ 
plete comparative study on the local optimization 
methods for TSP. [Cook, 2012] introduces the TSP 
from history to the state-of-the art. David Johnson 
[2014] discusses the importance and applications of 
random TSP 

[Gharan et al., 2011] give a (|-e)-approximation 
for a tiny e >0 using a probabilistic analysis, as 
[Klarreich, 2013] reports that the new progress is 
a 49.99...96% (totally 46 nines replaced by “...” ) 
over the optimum, a tiny margin for “graphical” 
traveling salesman problems. Notice that a very 
small percentage improvement may also be of great 
impact to the total length of large TSP instances. 

[Reiter and Rice 1966] study the cost distribu¬ 
tion of local optima under a gradient maximizing 
search in 39 integer programming problems. Their 
results suggest that the local optima follow a Beta 
distribution. [Golden 1978] examines six problems 
from the TSPLIB archive [15]. The resulting es¬ 
timates of optimal solutions are compared to the 
best solution found by the Lin-Kernighan algorithm 



[13]. The authors found that the Beta distribution is 
“a more appropriate distribution” than the Weibull 
distribution. 

Recently, [Vig and Palekar 2008], apply sampling 
techniques similar to Golden, and use the Lin- 
Kemighan algorithm to find optimal tour costs. 
The authors estimate raw moments from the one 
to four of the probability distribution of optimal 
tour lengths. They use these estimates to fit various 
candidate distributions including the Beta, Weibull 
and Normal cases. Vig and Palekar conclude that 
the Beta distribution yields the best fit. 

More recently, [Stuffle 2009] provide exact so¬ 
lutions to compute the mean, variance, the third 
and fourth central moment of all tour lengths. The 
computational complexity of computing variance, 
the third and fourth central moment is respectively 
0(n 2 ),0(n 4 ) and (n 6 ) where n is the number of 
nodes in a TSP 

A typical probability distribution of all tour 
lengths for a random TSP in a square unit is 
shown in Fig. 1 where the total node number is 
12. An example of TSPLIB Burmal4 is shown in 
Fig.2. Similar results are observed for different total 
number of cities for which all tour lengths can be 
obtained. 

The organization of remaining parts of this paper 
is: our major contributions are summarized in Sec¬ 
tion II, our methods are introduced in Section III, 
and Conclusions and future work are discussed in 
Section IV. 



Fig. 2. The probability density of all tour lengths in 
Burma 14.tsp with 14 nodes 


II. Results 

The main contributions of this work can be 
summarized as follows: 

• We propose Generalized Beta (GB) distribu¬ 
tion as the probability density function of all 
tour lengths distribution in a symmetric TSP in 
Euclidean space (ESTSP), the four parameters 
of the GB can be computed from given ESTSP 
data directly. 

• For the first time, we introduce an iterative 
Truncated GB (TGB) closed-form solution to 
obtain (l+|(^^) K_1 )-approximation for a 
STSP where K is the number of total iterations 
in TGB, and ct(>> 1) is the shape parameter 
of TGB and can be determined once the TSP 
instance is given. The result can approach the 
true optimum as K increases. 



Tour Lengths 


Fig. 1. The probability density of random 12-node TSP in a 
squre unit 


III. Methods 

Firstly, a problem formulation and some prelim¬ 
inaries are provided in this section. 

A. Problem Formulation 

Consider the n-node TSP defined in Euclidean 
space. This can be represented on a complete graph 
G- (V, E ) where V is the set of vertices and E 
is the set of edges. The cost of an edge (u, v) is 
the Euclidean distance (c uv ) between u and v. Let 
the edge cost matrix be C[cij\ which satisfies the 
triangle inequality. 
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Definition 1. Symmetric TSP (STSP) is TSP in 
Euclidean distance (called ESTSP) and the edge 
cost matrix C is symmetric. 

Definition 2. ASTSP is a STSP whose edge 
costs are non-negative and satisfies the triangle 
inequality, i.e., for any three distinct nodes (not 
necessary neighboring) (■ i,j,k ), (c tj +c jfc ) > c lk . 

Definition 3. TSP tour. Given a symmetric graph 
G in 2-dimensional Euclidean distance and its 
distance matrix C where c L j denote the distance 
between node i and j (symmetrically). A tour T 
has length 


N-l 

L = ^2 C T(k),T{k+ 1) (1) 

k =0 


where N is the total number of nodes in G and 
T(N)=T( 0) so that a feasible tour is formed. 
Definition 4. The approximation ratio of an al¬ 
gorithm. The ratio is the result obtained by the 
algorithm over the optimum (abbreviated as OPT 
in this paper). 

Observation 1. The probability density function 
of all tour lengths in an ESTSP can be modelled 
by a Generalized Beta (GB) distribution. 

This is observed in Fig. 1 and other ESTSPs for 
which we can obtain all tour lengths. More results 
are provided in next section. This is also validated 
and shown in [Vig 2008], where a scaled Beta 
distribution is applied with scaled mean and scaled 
variance. The author validated estimated results 
by Anderson-Darling (A-D) test and Kolmogorov- 
Smirnov (K-S) test for random TSP. They use 
these estimates to fit various candidate distributions 
including the Beta, Weibull and Normal cases, and 
conclude that the (scaled) Beta distribution yields 
the best fit. 

We further propose a Generalized Beta (GB) 
distribution. The probability density function (pdf) 
of GB is defined as 


f(x,a,(3,A,B ) 


{x - A) a ~ l {B - xf~ l 
Beta(a , /3) 


( 2 ) 


where Beta(a , (3) is the beta function 


Beta(a , /3) 


l 


i dt. 


(3) 


A and B is the lower bound and upper bound 
respectively, a > 0, (3 > 0, see [Hahn and Sh- 
piro, page 91-98,126-128,1967]. For TSP, A and B 
represents the minimum and maximum tour length 
respectively. 

The four central moments, mean (/i), variance, 
skewness and kurtosis of the Generalized Beta 
distribution with parameters (a, (3 , A, B) are given 
by: 


l‘- A + (B- 


Var = (B - A) 2 


a/3 

(oi + /3) 2 (a + f3 + 1) 


(4) 

(5) 


Skewness 


2 (/? — Ol) y/l + a + [3 
yj a + (3(2 + ol + (3) 


( 6 ) 


and Kurtosis 

6[cy 3 + o: 2 (l — 2/3) + /3 2 (1 + (3) — 2a (3 (2 + /?)] 
a/3(a (3 T - 2)(rr (3 -\- 3) 

The standard deviation is then given by 


a — VVar 


( 8 ) 


Once four central moments are known, or any 
four parameters of ( A , B , /i, Var, skewness, kur¬ 
tosis) are given, then four parameters of GB, i.e., 
(a, (3 , A, B) can be determined easily from the 
four moments match using Eqns.(4)-(7). When the 
problem size is not large, the four central moments 
can be computed exactly using methods proposed 
in [Stuffle 2009]. As the problem size increases, we 
can find any four parameters of ( A , B, /i, Var, skew¬ 
ness) firstly, then find four parameters of GB, i.e., 
(a, p. A, B). 

For medium or large size problem, currently it is 
not easy to find the fourth central moment. How¬ 
ever, the lower bound (A) can be easily computed 
by LKH code [14]. So in the following sections, we 
find four values (A, mean (/i), variance, skewness) 
firstly, and then compute other parameters ( B , a, 
/?). Firstly we introduce a method to compute 
maxTSP (B) [Gutin et al.,2002]. 

Definition 5. maxTSP. The maximum tour length 
(B) is obtained using LKH where each edge cost 
(cij) is replaced by a very large value (M) minus 
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the original edge cost, i.e., ( M-Cij ). M can be set 
as the maximum edge cost plus 1. 

Since the characteristics of random TSP and 
TSPLIB instances are different in Euclidean space, 
we introduce the GB as the probability density 
function for them separately. 

0.03 

0.02 

0.01 

B. GB as the Probability Density Function for 0 

Random TSP 


0.04 I 


For medium size random TSP problems with 
n varying from 20 to 100, we can obtain four 
central moments easily [Suffle 2009] and apply four 
moments match to find four parameters for GB. 
After obtaining four parameters, we then use linear 
regression to find closed-form solution to (a, p) of 
GB for random TSP. For n=20 to n=100, and we 
find that 

a(n) = 1.9197n - 32.166, R 2 = 0.9994 (9) 

0(n) = 1.1168n - 15.854, R 2 = 0.9982 (10) 

A{n) = 0.6932Vn + 0.8029, R 2 = 0.9956 (11) 
B(n) = 0.7649 n - 0.6393, R 2 = 0.998 (12) 


Fig. 3. The relative error between estimated results by GB 
and OPT by LKH for Random TSP n =200 to 500 


C. GB as the Probability Density Function for 
TSPLIB Instances in Euclidean Distance 

Firstly, we show an example of Burmal4.tsp, 
which we can permute its all tours and find 
that minimum tour length (A) is 3233 and 
maximum tour length (B) is 9139, Mean 
(/i)= 6679, Variance=503064, Skewness=-0.0632, 
Kurtosis=2.7972. Fig. 4 shows exact result 
(in black color) by permuting all tour lengths 
and estimated probability density function (in 
green color) by four central moment match with 
(A=3233, £>=9579, c*=13.96,0=11.79). It can be 
observed that two results match very well. 


where R 2 is a measure of goodness-of-fit with value 
between 0 and 1, the larger the better. We observe 
that Eqns.(9)-(12) are highly accurate by extensive 
computation results. 

Observation 2. The relative errors between 
estimated results (maxTSPs) by GB and LKH 
results are within 6.5% for random TSP. 

The relative error is defines as (EstimatedValue- 
OPT)/OPTxlOO%. We conduct tests for n=100 to 
n=500. The results are shown in Fig.3 where LKH 
is used to obtain maxTSP (B) results. We can 
observe that the relative error between our results 
and LKH are within 6.5% off the true optimums, 
with an average below 5%. Table 1 shows four 
parameters of GB for random TSPs with n varying 
from 90 to 99. 



Fig. 4. The exact (Permute All Tours) and estimated (Four 
Moments Match) probability distribution of Burmal4.tsp 

Observation 3. The relative error between the 
estimated maximum tour lengths by GB and LKH 
results is below 7% for medium size TSPLIB 
instances, with an average below 5%. 
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We conduct tests by set n= 14 to n =52 for which 
the four central moments can be easily computed 
and are given in [Stuffle 2009]. Fig. 5 shows the 
relative error between estimated results by GB and 
OPT by LKH. 



Fig. 5. The relative error between estimated results by GB 
and OPT by LKH for TSPLIB instances for n from 14 to 52 


TABLE I 

Four parameters of GB for some random TSP 

INSTANCES 


n 

A(OPT) 

B 

a 

p 

90 

7.18 

65.75 

179.68 

96.49 

91 

7.15 

69.49 

180.38 

96.30 

92 

7.01 

69.18 

185.51 

108.55 

93 

8.00 

71.24 

182.22 

100.63 

94 

7.73 

68.58 

175.38 

94.00 

95 

7.77 

69.53 

181.09 

105.93 

96 

7.40 

73.10 

204.60 

115.61 

97 

8.00 

74.01 

186.85 

107.33 

98 

7.73 

76.44 

208.67 

123.96 

99 

7.54 

74.95 

203.46 

109.03 


In Table 2 we show four parameters of GB for 
TSPLIB with n varying from 14 to 52. 

D. Truncated Generalized Beta Distribution Based 
on Christofides Algorithm 

Next, we introduce our algorithm, Truncated 
Generalized Beta distribution Based on 
Christofides Algorithm (TGB). TGB algorithm 
performs in seven steps: 


TABLE II 

Four parameters of GB for some TSPLIB instances 


TSPLIB 

A(OPT) 

B 

a 

p 

burmal4 

3323 

9139 

13.97 

11.79 

ulysses16 

73.98 

180.52 

10.24 

6.52 

grl7 

2085 

6160 

19.22 

10.60 

gr21 

2707 

10680 

32.95 

19.80 

ulysses22 

75.3 

241.50 

17.52 

12.79 

gr24 

1272 

4929 

51.55 

27.21 

fri26 

937 

3681 

28.87 

16.91 

bayg29 

1610 

6654 

42.17 

26.42 

bays29 

2020 

8442 

45.14 

27.52 


• (2). Taking G restricted to vertices of odd 
degrees in MST as the subgraph G*; This 
graph has an even number of nodes and is 
complete; 

• (3). Finding a minimum weight matching M* 
onG*; 

• (4). Uniting the edges of M* with those of the 
MST to create a graph H with all vertices 
having even degrees; 

• (5). Creating a Eulerian tour on H and reduce 
it to a feasible solution using the triangle 
inequality, a short cut is a contraction of two 
edges (z, j) and (j, k ) to a single edge (z, k)\ 

• (6). Applying Christofides algorithm to a 
ESTSP forms a truncated GB (TGB) for the 
probability density function of optimal tour 
lengths, with expectation (average) value at 
most 1.50PT-e, where e is a very small value; 
Applying fc-opt to the result of Christofides 
algorithm forms another TGB for probability 
density function of optimal tour lengths; 

• (7). Iteratively apply this approach, taking the 
expectation value of (K — l)-th iteration as the 
upper bound of the K- th iteration, we have the 
expectation value after K iterations (K > 2). 

LEMMA 1. Applying Christofides algorithm 
to a ESTSP forms a truncated GB (TGB) for 
the probability density function of optimal tour 
lengths, with expectation (average) value at most 
1.50PT-e, where e is a very small value. 


• (1). Finding the minimum spanning tree MST Proof: This is because that Christofides algo- 

of the input graph G representation of metric rithm assures that its result is at most 1.50PT so 
TSP; that those tours with lengths more than 1.50PT are 
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excluded (truncated), as shown in Fig. 6 where tour 
lengths larger than 1.50PT (1.5A) are truncated in 
black color. 



Fig. 6. The Truncated GB by applying Christofides algorithm 


The TGB in this case is truncated from above. Set 
X as the variate of the GB, the probability density 
function (pdf) of TGB is given by 


ft(x,a,P,A,B,a,b ) 


= f(x,a,P,A ,B) 

Pr[a < X < b] 

{x - A) a ~ l (B - xf~ l 
fa( x ~ A) a ~ l (B - x)P~ l 


(13) 


where a-A and 6=1.5 A. Therefore 1.5A is the 
upper bound. We know that the average of 
Christofides algorithm is at most 1.50PT-e (set as 
fi\) where e is a very small value, this is also 
validated in [Blaser et al., 2012]. 


The four parameters of GB for some random TSP 
and TSPLIB instances are provided in Table 1 and 
2 respectively. 

Definition 6 . k -opt method. Local search with k- 
exchange neighborhoods, also called fc-opt, is the 
most widely used heuristic method for the TSP. k- 
opt is a tour improvement algorithm, where in each 
step k links of the current tour are replaced by k 
links in such a way that a shorter tour is achieved 
(see [Helsgaun 2009] for detailed introduction). 

In [Helsgaun 2009], a method with computational 
complexity of 0(k 3 + k^/n) is introduced for fc-opt. 



Fig. 7. The Iteratively Truncated GB 


Christofides algorithm forms another TGB for prob¬ 
ability density function of optimal tour lengths. 

Proof: Applying fc-opt to the result obtained 
by Christofide algorithm as shown in Fig.7. The 
TGB in this case is truncated from above. Denote 
the first truncation by Christofides’ algorithm as 
the first truncation (K= 1). The probability density 
function of the second TGB is given by 


f?(x,a,/3,A,B,a 2 ,b 2 ) = 


{x - A) a ~ l {B - xf- 1 

rb 2 
>a 2 


(14) 

In this case, a 2 =A, 62 = 1 .5 A because the distribution 
is based on the result after applying Christofides 
algorithm which assures the upper bound is at 
most 1.5A, see Fig.7. Setting ® 2 =^E^= 0 , 

fr 2 = 1 '|l~^ =g=r|, we have 


Co 



A) a ~ l (B - xf~ l dx 


f\{B - A)xY~\{B - A){ 1 
Jo 

c B-A) a+ P- 1 B 2 (0,b 2 ,a,/3 ) 


x)P 1( i£ 

(15) 


where 


B 2 (0,t,a,(3) 



x a ~ l {l - xY~ l dt 


(16) 


LEMMA 2. Applying /c-opt to the result of 
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By the definition of the expectation (mean) value 
(denoted as /j|) for /^(rc, a, A. f?, «2, b 2 ), we 












have 


Ht~A = / (x - A)ft(x,a,/3, A, B,a 2 ,b 2 )dx 


f^( x — A) a (B — x)P 1 dx 

= ^ Cb 
(B-A) a +^B 2 (0,b 2 ,a + l,P) 
C 0 

B 2 (0,b 2 ,a + l,P) 


= (B-A) 


—> Ft — A + {B — A) 


B 2 (0,b 2 ,a,P) 
B 2 (0,b 2 ,a + l,P) 

B 2 (0, b 2 ,a, p) 

(17) 


Taking the expectation value of (K — l)-th iter- 
ation as the upper bound (bx = — B _ A ) °f the 
K- th iteration, we apply this approach Iteratively 
and have the expectation value after K iterations 
(K > 2), denoted as /if , 


T = A + (B-A) 


B 2 (0, bx, a + 1,/3) 
B 2 (0, bx, (3) 


= A + (B-A)g(b K ) 


Next we provide the proof for our main theorem. 
THEOREM 1. Applying TGB iteratively, we 
can obtain quality-proved approximation, i.e., 
(l+|(^^) K_ 1 )- a pp rox imati on where K is the 
number of iterations in TGB, a is the shape pa¬ 
rameter of TGB and can be determined or estimated 
once TSP instance is given. 

Proof: Notice that the expectation value of 
the (K- l)-iteration is taken as the upper bound 

/V K—l_ A 

(bx = b-a ) the K- iteration, as shown in 
Fig.7. Setting 


_ B^XbK, a + h_P) 

B 2 (0, bx, a, P) 


(18) 


The exact expression of g ( h k ) can be stated in a 
hypergeometric series, and 


■ 62 ( 0 , bK ,«, P) — ~^—F(a,l—P,a+l,bK) (19) 

a 

and F(a, b, c , x) 


ab a(a T 1 )b(b T 1) 9 

= H - X + —-7-- - X Z 

c c(c + 1 ) 2 ! 

a (a + l)(a + 2)6(6 + !)(& + 2) 3 

c(c + l)(c + 2)3! ' { ’ 

In all cases, we have a > 1 , /? > 1 , bx G ( 0 , 1 ), 
therefore F(a, 6, c, x) is an monotonic decreasing 
function. We have 


— At (B — A3jg(bfj ^ A T 0.5A 


Qd T 1 


( 21 ) 


ol T 2 

continue this for g(b 3 ), uf 5 ( 64 ), u^,..., so forth, 
we have 

0.5A a + f ](_i 


and 


91k) = 


#2(0, bK^ ft + 1 A) 
# 2 ( 0 , bK, a, P) 

< A b K 

Qd T 2 

= 0.5A (^)^- 1 


( 22 ) 


(23) 


Therefore 


Ft 


K = 21 + (5 - 21 ) 


-62(0, fry, q + 1 , P) 
£ 2 ( 0 , bx, a, P) 
A)g(b K ) 


= A + (B 

*< 1 + 50 m >* 


(24) 


This means that, after the iT-th iterative trunca¬ 
tion, we can obtain the expectation value of (/ if ) 
which is close to the optimum (OPT=A) when K 
increases. Actually, to make the approximation less 
than Co, the TGB algorithm needs K to be at least 

/1 1 log2(C 0 — 1 ) \ — 

Table 3 shows OPT, a, /?, iteration numbers 
and the approximation ratio (Appr) for TSPLIB 
instances with n < 600, where (a, /3) are obtained 
(or estimated) from (A, mean, variance, skewness) 
in Eqns (4)-(7), and K is obtained in by TGB 
algorithm which modifies LKH code. We observe 
that the TGB results are consistent with LKH OPT 
results in most cases, there are only a few cases 
where TGB results are few percentage difference 
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from OPT, with 0.2% off the true optimum on 
the average. For instance, the difference is 7.8% 
for berlin52.tsp and 1.1% for ulysses22.tsp. Our 
results are consistent with [Applegate 2003]. These 
results validate Theorem 1. Notice that LKH code 
performs very fast in practice and our results are 
based on average performance. 

TABLE III 

Four parameters for some TSPLIB instances (n < 
600) 


computing more statistical metrics, analyzing 
the average performance of approximation al¬ 
gorithms and others. 
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a K A Appr 


ulysses22 

17.52 

91 

82.17 

1.0042 

berlin52 

57.40 

101 

9052.43 

1.0900 

pr76 

115.54 

2451 

108159.41 

1.0000 

rat99 

128.18 

1821 

1219.21 

1.0000 

kroAlOO 

137.30 

3366 

21285.39 

1.0000 

pr299 

422.28 

29117 

48194.85 

1.0000 

lin318 

563.15 

39112 

42042.44 

1.0000 

rd400 

735.15 

34936 

15275.79 

1.0000 

d493 

695.93 

129767 

35018.32 

1.0000 

rat575 

892.03 

84814 

6796.36 

1.0000 


IV. Conclusions 

In this paper, for the first time, we proposed 
GB and Truncated Generalized Beta distribution 
(TGB) for the probability distribution of optimal 
tour lengths in a symmetric TSP in Euclidean space. 
Notice that our TGB results are based on expec¬ 
tation (average) value of probability distribution, 
which may be overestimated for the number of 
iterations. In practice, LKH algorithm performs 
very fast, with estimated computational complexity 
of 0(n 2 2 ) [9]. A few possible research directions 
include: 

• Improving the computational complexity. Cur¬ 
rently the Christofides algorithm with mini¬ 
mum perfect matching has computational com¬ 
plexity 0(n 3 ). For large instances, this com¬ 
plexity should be reduced. 

• Find more efficient ways to compute especially 
the third and fourth central moments of a given 
TSP instance. 

• Finding more applications. With closed-form 
probability density function at hand, a lot 
of things can be done better. For instance, 
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